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Abstract 

a; 

Change in the coefficients or in the mean of the innovation of an INAR(p) process is a sign of distur- 



bance that IS important to detect. The proposed methods can test for change in any one of these quantities 
separately, or in any collection of them. They make both one-sided and two-sided tests possible, further- 
'■ more, they can be used to test against the 'epidemic' alternative. The tests are based on a CUSUM process 

^0 ' using CLS estimators of the parameters. Under the one-sided and two-sided alternatives consistency of 

'^H ■ the tests is proved and the properties of the change-point estimator are also explored. 
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; 1 Introduction 



A time-inhomogeneous INAR(p) process is a sequence {Xk)k^-p+i given by 



Xk ^ ^ ^i,k,j + ■ ■ ■ + ^ ^p,k,3 + Sk, fcsN, (1.1) 
j=i i=i 

• i— I ' 

; I ■ where {sk ■ k G N} is a sequence of independent non-negative integer-valued random variables, for each 

k e N and i E {1,. . . ,p} the sequence {^i,k,j : j G N} is a sequence of i.i.d. BernouUi random variables 
with mean ai^k such that these sequences are mutually independent and independent of the sequence 
{sk '■ k G N}, and Xq, . . . , are non-negative integer-valued random variables independent of the 

sequences {^i,k,j ■ j G N}, fc G N, i G {l,...,p}, and {sk ■ k G N}. The numbers ai_k are called 
coefficients, and we will refer to £i, £2, • . • as the innovations. Time-homogeneous INAR(p) processes have 
a number of applications, which are summarized, e.g., in Barczy et al. (2011). 

The reason that we initially define our process as time-inhomogeneous is that we would like to test for 
a change in the parameters, therefore we have to allow them to vary over time. In the proofs, however, a 
majority of the results will be based upon the properties of time-homogeneous INAR(p) processes. 
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In applications the time series underlying an observation is usually supposed to be homogeneous, and this 
can lead to false conclusions if the parameters have changed during the time of the observation. Therefore 
testing for a change in the parameters has been an important question. The monograph of Csorgo and 
Horvath (1997) gives an excellent overview of the subject for many times series models, and in their own 
context, their main results are stronger than those of this paper — e.g., in place of Theorem 4.2 they usually 
prove convergence in distribution under a variety of conditions. Their methods, however, cannot be directly 
used to handle INAR(p) processes — they routinely involve a Taylor expansion of the likelihood function, 
which we cannot do because we are not assuming that the innovation distribution comes from a parametric 
family. Furthermore, the autoregressive processes investigated in Csorgo and Horvath (1997) have the usual 
type of autoregression — i.e., the next element of the time series is a measurable function of the previous 
elements, plus an independent innovation. In our case, however, the next element depends stochastically on 
the previous elements as well. Therefore, it is more fruitful to consider our model as a special multitype 
branching process. This will be our approach as well, but we remark that much of the motivation for the 
paper is due to Csorgo and Horvath (1997) and Gombay (2008). We note that our Theorem 3.1 and Theorem 
4.1 are weaker than the equivalent versions of Theorems 1 and 2 in Gombay (2008) - this is due to our model 
being more complicated. In our continued research we plan to improve our results in this direction. Similarly, 
in place of Theorem 4.2 we hope to show that the difference t„ — r converges in distribution, as it was shown 
in Csorgo and Horvath (1997) for the simpler models. Finally, we remark that it is indeed proper to take 
motivation from these sources because our process resembles an AR(p) process in its covariance structure. 

Change detection methods for INAR(p) processes in general (i.e., with no prespecified innovation distri- 
bution) have only been proposed in a few papers — we refer to Kang and Lee (2009) especially, where the 
authors give a test statistics similar to ours for a more general model. However, no result is available under 
the alternative hypothesis and the asymptotics of the change-point estimator are not given — we will give 
results to these questions which strengthen the theoretical foundations of the method considerably. 

The test has already been published in T. Szabo (2011b). However, the referred paper only states the 
main result but does not provide the proof. Also, it contains no results under the alternative hypothesis 
either, which is the main result of this paper. In the appendix we will also provide the missing (although 
rather standard) proof in T. Szabo (2011b). 

Now we proceed with the formulation of the statistical problem. We assume fik '■= E(efe) < oo and 
0<al:= Var(efc) < oo. 
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Write the parameter vectors as 



Oil.k 

-'■ Oh 

and let us choose a subset PV of {l,2,...,p+l} such that 



PV = {ii,i2, ■ ■ ■ for some i > 0, ii < i2 < ■ ■ ■ < i(- 



(1.2) 



Also, we can write 

NV := PV^ = {ji,j2, ■ ■ ■ ,jp+i-e}, h<j2<---< jp+i-e, 
where denotes the complement of a set. Let us now define 



' • • • ' 



The vector y>j. is the parameter vector of interest and rj^. is the 'nuisance' parameter vector. For a 
fixed number of observations n we want to test the null hypothesis 

Ho: ^ij • • • J identically distributed and 0i — 02 — ■ ■ ■ — On 

against the alternative 

Ha : there is an integer r € {1, . . . , n — 1} such that 

V>l=---=Vr¥' Vt+1 = ---=V>n but = • • • = J7„, 

£i, . . . , Et- are identically distributed, 

and Er+ij • • • J are identically distributed. 



The main difficulty is that the change is not in a location or scaling parameter as in most of the change 
detection models, and standard techniques based on likelihood are not applicable. 
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Under the null hypothesis Hq we have 



Q!l,l 


























a 










Up 




Ai 


. ^1 















^2 . ^2 



and the process (1.1) is referred to as stable, unstable or explosive whenever aiH \-ap < 1, aiH \-ap = 1 

or ai H + ap > 1, respectively. Basic differences between the three types are summarized in Barczy et 

al. (2011). We will study only the stable case ai + ■■ ■ + ap < 1 with nonvanishing innovation (i.e. with 
/U > 0), when the Markov chain {Xk)k^o given by 



Xk 



Xk- 



A; > 0, 



has a nondegenerate, unique stationary distribution (see e.g. Quine (1970)). Under the alternative hypothesis 
we will require that the process be stable both before and after the change. This assumption is central to our 
proofs, because if the process is not stable, then it is not crgodic either and we cannot use any of the methods 
outlined below. For unstable and explosive processes completely different tools should be developed. 

The structure of the paper is the following. In the second section we propose a CUSUM-like test process 
similar to that of Kang and Lee (2009), in the third section we compute its limiting distribution and, based 
on the process, we introduce both one-sided and two-sided tests. For simulations on the power of these tests 
we refer the reader to T. Szabo (2011b). Our main results under the alternative hypothesis are stated in 
Section 4 and a real data illustration is given in Section 5. Appendix A contains detailed proofs under the null 
hypothesis, while Appendix B contains the proofs under the alternative hypothesis. The paper is concluded 
by Appendix C, which contains some necessary technical lemmas and calculations. Unless otherwise noted, 
we understand convergence as n ^ oo. We will denote the set of positive integers by N and the set of 
nonnegative integers by Nq. 
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2 Parameter estimation and the construction of our test process 



2.1 Pcirameter estimates 

The first step in parameter estimation is introducing the sequence of martingale differences {Mk)keN- 



Mk=Xk- E(Xfe| =Xk- cJXk-i - kGN, 



(2.1) 



where (J>i)ngN is the natural filtration. The conditional least squares estimators of the parameters, first 
introduced by Klimko and Nelson (1978), can be calculated by minimizing the sum of squares 



^ ^ 1 ^ 

:„(ai,. . . ,ap,M) := -^M^ = ■r^(^/c - a^Xk-i - 



fc=i 



fc=i 



with respect to ai, . . . , Up, fi. With a reasoning completely analogous to that of 3.1 Lemma and 3.1 Proposi- 
tion in Barczy, Ispany, and Pap (2012) we can show that i?„ has a unique minimum given by (2.2) whenever 
Q„ is invertible. Now, we will show in C.l that this is true with an asymptotic probability of 1, therefore 
the parameter estimates exist and are unique with an asymptotic probability of 1. As all our results our 
asymptotic, this will be sufficient for the purposes of the paper. 



:=Q-'Y^Xk 



fe=i 



Xk-1 
1 



fe=l 







Xk-1 


1 




1 



(2.2) 



We note that this estimator is strongly consistent under the null hypothesis (Du & Li, 1991), and the 
estimation procedure itself supposes that the null hypothesis is valid; however, the calculations can be 
carried out under the alternative hypothesis as well. Under the alternative hypothesis the weak limit of 0„ 

' — (n) 

is given in Lemma B.3. Replacing the parameters by their estimates in we obtain , i.e.. 



Xu — d„ 



Xk-i 

1 



(2.3) 



Although not a parameter in which we are looking for change, the estimate of the variance of the innovation 
cr^ will also appear in our test process, therefore we have to provide an estimator for it. To do this, we 
introduce 



iVfe = Ml - E{Ml\Tk-i) = Ml - ai{l - ai)Xk-i 



Qp(l - ap)Xk-p - cr^, > 0. 
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Minimizing ^^^'^ respect to cr^ we obtain the conditional least squares estimate 

1 " 

< = -n^^^k - - c^i)Xk-i - ap)Xk-p). (2.4) 

fe=i 

However, in this estimate the true parameters are still present. The estimate that we will use is given by 
replacing the a coefficients and /x both in the formula and in M| by their estimates: 

= E ((^^"T - "1"' (i - - • • • - (i - ^^-) • (2-5) 

2.2 Construction of the test process 

We use a formal analogy between the INAR(p) process and the well-known AR(p) process (Venkataraman, 
1982) to obtain analogues of score vector and information quantities as in Gombay (2008). We briefly recall 
the motivation of the test process as given in T. Szabo (2011b). We briefly recall the motivation of the test 
process as given in T. Szabo (2011b). Due to the martingale central limit theorem, 

\ ^=-1 / te[o,i] 

where c is a constant depending on and a^, and (Wt)o^t^i is a standard Brownian motion. Therefore, by 
a rough approximation 

(Mi,...,M„)~7V(0,c£„), 
where En is the nx n identity matrix. The approximate likelihood function is 

We will take the derivative of the log-likelihood function and work with that quantity. The flrst term will 
be regarded as constant. This is a simpliflcation because c actually depends on the parameters but taking 
this into account leads to calculations that are difficult to handle. Also, we will not take into account the 
constant factor before the sum of the Mk but will rather work with the analogue of the information matrix. 
Therefore, we consider the following analogue of the loglikelihood function: 

n 

Rn{au...,ap,fi) = 

fe=i 
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Note that this is the sum that we had to minimize for CLS estimations. The role of the score vector will be 
played by 



-Vi?fc(g„)=^Mj 



(n) 



1 



The information matrix J„ is defined by 



In := ^E[{Viife(0)-Vi?fc_i(0)}{Viife(0)-Vi?fe_i(0)}^ = ^(aJXfc_i+a2 



fe=i 



fe=i 



Xk-i 
1 



Xk-i 

1 



with 



Q!2 := [ai(l - ai), • • • , - ap)]^ . 



(2.6) 



The estimate 7„ is defined by replacing in J„ the variance and all the parameters in by their CLS 
estimates. This leads to the p + 1-dimensional test process (A^n(t))os;t^i given by 



[ntj 



k=l 



1 



(2.7) 



Note that the process (Al„(i))osgf^i can also be written in the CUSUM form 



A4„(t) = J„ 



1/2 



L«tj 



k=l 



Xk-1 
1 



fe=l 



1 



1/2 



( 








) 













I -?„ ^ Q[nt\ (^[nt\ -^n) 



This is close to the CUSUM processes used by Kang and Lee (2009). In this paper the authors investigated 
a change in random coefficient INAR(p) processes based on both maximum-likelihood and conditional least 
squares estimators. A general convergence theorem was proved for a test process similar to ours based on a 
broader class of estimators. However, the authors did not state any results under the alternative hypothesis. 
Our Theorem 3.1 is analogous to Theorem 1 of Kang and Lee (2009). However, while their model is more 
general than ours, our Theorem 3.1 is not a consequence of their Theorem 1, even though the applied methods 
are largely identical. 
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3 Testing procedures 

Definition. A time-homogeneous INAR(p) process {Xk)h^-p+i is said to satisfy condition Co, if E{Xq) < 
oo, E(X^p_,_j) < oo, E(£5) < oo, ai + • • • + < 1, n > all hold for it, and if, furthermore, 

ai H h Q!p > or ct^ > 0. 

An INAR(p) process (Xfe)fc^-p+i which satisfies Ha is said to satisfy condition Ca, if (^fe)-p+i^fc^r 
and {Xi-)k^T+i satisfy condition Co- 

Under the null hypothesis we have the following result, which allows the construction of various test 
statistics. 

Theorem 3.1. If {Xk)k'^-p+i satisfies Hq and condition Co, then 

M-n S as OO, 

where (B(t))o^t^i is a (p + 1)- dimensional standard Brownian bridge, and — > denotes convergence in 
distribution in the Skorokhod space D{[0, 1]). 

By the continuous mapping theorem we obtain the following corollary. 

Corollary 3.2. Under the assumptions of Theorem 3.1 we have 

sup M'{^{t)^ sup B(t), (3.1) 

inf A^W(t)^ inf B{t), (3.2) 
sup |7W«(t)| A sup \B{t)l (3.3) 
sup M^^ it) - inf Al« {t) A sup B{t) inf B{t) (3.4) 

O^t^l O^tsjl O^f^l OsSt^l 

as n — >■ 00, where {Mn\t))o^t^i, i = 1, . . . ,p + 1, denotes the components of (A4rt(^))o^t^i, o-nd 
(i3(t))o^t^i is a Brownian bridge. 

Theorem 3.1 is proved in Appendix B. 

Since {'B{t))o^t^i in Theorem 3.1 has independent components, we need to define the tests component- 
wise only. For simultaneous test-for-change in d parameters, to have an overall level of significance a, we 
use a* := 1 — (1 — a)^/"^ for each component. We can test for change in a single component, 6^^\ i G PV 
(with 9^^^ = aj for i = 1, 2, . . . , p and = /x, according to the definition of 9) in the following way: 

Three different tests can be constructed: 
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Test 1 (one-sided): If 



sup A^W(i) >Ci(a*) or inf 7WW(i) < -(7i(a*), 

then we conclude that there was a downward or upward change in parameter ^('^ (respectively) along the 
sequence Xq, Xi, . . . , X„. 
Test 2 (two-sided): If 

sup \M^Ht)\^C^{a*), 

then we conclude that there was a change in parameter 6^'^^ along the sequence Xq, Xi, . . . , X„. 

The third test is designed against the so-called epidemic alternative (see Csorgo & Horvath, 1997, 1.7.4) 
and is included for the sake of completeness, following Gombay (2008). No results will be proved under this 
alternative hypothesis. 

Test 3: If 

sup M^\t) - inf MiiHt) > Cs{a*), 

then we conclude that there was a temporary change in parameter 9^^^ along the sequence Xq, Xi, ... , X„. 
Critical values are obtained from the limit distributions in Corollary 3.2, namely, from the identities 

P ( sup B(t) ^x] = e"^''', x^O, 
P ( sup \B{t)\ ^ x) = 2 V(-l)'=+ie-2'='^', x^O, 
P ( sup B{t) - inf B{t) ^ = 1 -2f^{4k^x^ - l)e~2fe'^', 

respectively, where {B{t))o^t^i is a Brownian bridge (for the third relationship, see Kuiper, 1960). 



4 The test under the alternative hypothesis 

In this section we will state the two main results of this paper which extend the results of Huskova, Praskova, 
and Steinebach (2007) and Gombay (2008) to the INAR(p) process. 
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4.1 Consistency of the test 

The following theorem, the analogue of Theorem 3.1 in Huskova et al. (2007), describes the behaviour of the 
maximum of the test process if a change occurs in the mean of the innovation. An immediate conseqiience 
of the theorem is that the maximum of the process tends to infinity stochastically as n — >■ oo, which suffices 
for the weak consistency of the proposed test. 

Theorem 4.1. Suppose that Ha holds with r = max([npj ,1), p e (0, 1), and the change is only in the 
innovation mean, namely, the innovation mean changes from /z' to fi" , where ji' > ji" > Q. Suppose 

thai E(X^) < oo, . . . , F,{Xlp^^) < oo, E(ef ) < oo, E(e^^i) < oo, ai H + Up < 1 and ^ > hold 

both before and after the change. Assume that ai-{ — ■ + ap > or > 0. Then for any 7 € (O, jj we 
have 



max , 



in) 



nV' + Op(n^ '•') as n — >■ 00, 



with 



V := p(l - p)(/z' - ij")eJ+,C"Q C'ep+i > 0, 
where Cp+i is the p+ 1-st {p+ 1- dimensional) unit vector, 



C :=E 



/ 


— 1 






^\ 




X 




x' 




1 




1 




V 











C" := E 





— // 




— // 


"\ 




X 




X 






1 




1 




V 











where the distributions of 



X = 



X' 



p+1 



X 



X" 



p+1 



are the unique stationary distributions of the Markov chains (-X^fe)o^fe^[npj o-nd (-X'fe)[npj+i^fe^n; respec- 
tively, and 

Q := pC + (1 - p)C". 

Remark. Theorem 4.1 can be reformulated for a change in the other parameters as well. In other cases 
we have to investigate the maximum or minimum of Y^^^i for some 1 ^ q ^ p. This will be 

explored in more detail in B.5. One warning is given here for emphasis, although it will be repeated later 
on, namely, the one-sided test should only be used if we are certain that only one parameter of the process 
has changed. 
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4.2 Estimation of the change point 



Based on the score vector analogy described in Appendix A, the estimator of r is 



r„ :=mini A:€ {l,...,n} :yMj") = max V Mi") 



(4.1) 



for the downward one-sided test, 



T„ = min < fc e |1, . . . , n} : M)"^' = min 



M, 



in) 



J = l 



(4.2) 



for the upward one-sided test, and 



f„ = min < A: € {1, . . . , n} : 



k 




m 


1 


^Mj") 


= max 


^Mj") 













(4.3) 



for the two-sided test. 

If there is a change in the coefficient then the estimator of r is based on the process ^j-ii 
k = 1, . . . ,n. 



Theorem 4.2. Under the assumptions of Theorem 4-1, we have 



Tn — [npl = Op(l) as n ^ oo, 



where Tn is defined by (4.1). 

Remark. We used the change-point estimator from (4.1) because in Theorem 4.1, the mean fi changes 
downward. If it changed upward, then we should apply the estimator from (4.2) because it corresponds to 
the appropriate one-sided test. The estimator from (4.3) corresponds to the two-sided test and satisfies the 
statement of Theorem 4.2 regardless of the direction of the change. The proof of this is analogous to the 
proof of Theorem 4.2 but even more technical, therefore it will be omitted. 

Remark. Similarly to Theorem 4.1, Theorem 4.2 holds for a change in other parameters as well. We also 
note that the result is slightly stronger than the similar Proposition 3.1 in Huskova et al. (2007). Similar 
results are valid for change in a location parameter (see Csorgo & Horvath, 1997), and in these cases the 
limit distribution is nondcgenerate. Therefore we can conjecture that Theorem 4.2 cannot be improved upon 
in terms of convergence rate. 
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Theorem 4.1 and Theorem 4.2 are proved in Appendix B. If we would like to prove different versions of 
these theorems (e.g., if we postulate a change in a parameter other than fj,), we need to slightly modify the 
proofs in several points. Any points where these modifications are not straightforward will be indicated in 
B.5. 



5 Illustration 

Now we provide two real data examples of the \mc of our method. Since our model includes initial values, 
the series were not investigated in their full length, but the first p values were taken as the initial values 

Our first example is the dataset of monthly polio cases in the US, as reported by the Centers for Disease 
Control and Prevention. It is available online at Hyndman (n.d.) and is 166 long. In Kang and Lee (2009) 
the authors found a significant decreasing trend in this series, while in Davis and Wu (2009) and Davis et 
al. (2000) the trend was found insignificant. It is widely agreed (see also Silva, 2005) that the underlying 
process is first-order, which is also supported by the partial autocorrelation function. Therefore we treated 
it as an INAR(l) process and calculated the CLS estimates given by (2.2). They were Si = 0.30646 and 
ju = 0.94091. The maximum of the absolute value of A^^gg was 1.2647 and the maximum of the absolute 
value of Al^gg was 1.1232. Applying the two-sided test simultaneously to the two parameters and requiring 
an overall significance level of 0.05, the critical value for each component is 1.48 (the individual significance 
levels are 1 — v'0.95 ~ 0.0253), therefore, the null hypothesis is not rejected. 

Our second example is a dataset of public drunkenness intakes in Minneapolis, also accessible at Hyndman 
(n.d.). This dataset is 139 long. After an examination of the partial autocorrelation function a seasonal 
INAR(12) model seems a rational choice, but with the assumption that only ai and ai2 are nonzero (for 
another similar calculation, see the real data section in Barczy et al., 2011). The estimates are 



Si 




/ 

n 


Xk-i 






Sl2 




E 


Xk~l2 




Xk-12 






fc=i 












\ 


1 




1 



k=l 







0.8154 


Xk-12 




0.1419 


1 




9.6944 



The maxima of the absolute values of the respective components of Mn are 2.0333, 1.3497 and 1.5788. 



A comparison with the critical value of 1.545 (individual significance of approximately 0.017) results in the 

/ J, /„A \ 139 

rejection of the null hypothesis. Based on I J2i=i 
the 53rd entry in the original series). Repeating the procedure for the series before and after the change. 



^ ^i^/e-i I our estimate for the change point is 41 (i.e., 

A;— 1 
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the null hypothesis is accepted for both of them. For the series after the change, the CLS estimate of ai2 is 
negative but an inspection of the partial autocorrelation function reveals that this series is more appropriately 
modeled as an INAR(l) process, for which the parameter estimates are Si = 0.8915 and /I = 24.8429 and 
the null hypothesis is accepted. 



A The process under the null hypothesis 

In this section wc will give some results for the process under the null hypothesis, including a lemma for 
ergodic convergence rate. The conditional least squares estimators will be calculated and we will give details 
on the calculation of the test process (2.7). 

A.l Regression equations 

The INAR(p) process is formally analogous to the AR(p) process. To exploit this analogy we need to state 
several regression equations for the process. First we write the equivalent of (1.1) for the vector-valued 
process {Xk)ken- 



(A.l) 



where 













^P.kJ 




£k 




1 






















; ^2,k,j — 


1 

































This form makes it even more apparent that the INAR(p) process is a special multitype branching process 
with immigration. According to standard literature (see, e.g., Quine, 1970), if the matrix 



A :-- 



E(€i,i,i) ••• mp,i,i) 



1 






OL-p— 1 OL-p 





(A.2) 



is primitive (i.e., some power of it is elementwise positive), the ergodicity of the process depends only on the 
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spectral radius p{A), and the process is ergodic if p{A) < 1. In Barczy et al. (2011), (2.7), it is shown that 
this is equivalent to the condition that ai + . . . + ap < 1. 

Recalling the Mk martingale differences from (2.1) we can write 



Xk = AXk-i + (m + Mfe)ei, 



(A.3) 



where ei is the first unit vector. Based on (A.3) we obtain 



Xf = {AXk-if^ + ((/X + Mfc)ei)®2 + {AXk-i) ((m + Mfe)ei) 



+ {{n + Mk)ei)(S){AXk-i) 



(A.4) 



= A^^Xf^^ + + Mkfef + (/z + Mk)iAXk-i) ® ei 



+ (/x + Mfc)ei0(AXfe_i) 



where (8) denotes Kronecker product of matrices. 
A. 2 Ergodicity 

Under the null hypothesis let us denote by X a random vector with the unique stationary distribution of 
{Xk)k£N- Because our process is ergodic, we can apply the ergodic theorem. In its well-known form it states 
that if E(|(7(X)|) < oo for some function g, then 



This is, for example, Theorem 2 in 1.15. in Chung (1960). However, instead of the convergence of averages, 
we will frequently require the convergence of expectations, i.e.. 



whenever the right hand side is finite. This is Theorem 14.0.1 in Meyn and Tweedie (2009). The result in 
(A. 6) also implies convergence of any component of the matrices. Under the alternative hypothesis we will 
additionally apply 



This result can be found in Orey (1971, Theorem 4.3) or in Meyn and Tweedie (2009, Theorem 13.1.2). 
The convergence rate in (A. 6) can be estimated by the following lemma. 




(A.5) 



E(Xf ) ^ E(X"^), 



/3eN, 



(A.6) 




(A.7) 
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Lemma A.l. There is a constant n G (0, 1) such that 



|E(X,)-E(X)||=0(7r'=), ||E(Xf)-E(X )||=0(7r^). 



Proof. We use (A. 3) to conclude that 



E{Xk) = AE{Xk-i) + nei. 



Taking the Umits as fc — >■ oo we have 



E(X) = AE(X)+Mei, 



hence 



E(Xfe) - E(X) = A{E{Xk-i) - E{X)). 



Similarly, from (A.4) and E(M||7fe_i) = aJXk-i + cr^ we have 



(A.8) 



E(Xf)-E(X ) = A^\E{Xf^,)-E{X )) + (aJ(E(Xfc_i) - E(X)))ef 

+ M(^(E(Xfc_i) - E(X))) ei + nei {A{E{Xk-i) - E(X))) 

= A^\E{Xf^,) - E(X^')) + ef aJ(E(Xfe_i) - E(X)) 
+ tx{A ei)(E(Xfc_i) - E(X)) + ^(ei A)(E(Xfc_i) - E(X)). 



(A.9) 



Here we used the fact that for any c e M and real vector v we have cv = vc, where the second multiplication 

is a proper matrix product. Furthermore, we used the following property of the Kroncckcr product: for 
any matrices A,B,C,D we have (AB) ® (CD) = {A (g) C){B D), specifically, if C is a column vector, 
{AB) (8> C = {AB) ®{C-l) = {A® C){B ®l) ^ {A® C)B (this identity can also be used when the first 
factor consists of a single factor instead of the second). Hence, 



E{Xk)-E{X) 

^r, 

E(Xf)-E(X ) 



A 

ef'^c^J+^l{A®e^)+^,{e^® A) A®' 



E(Xfc_i)-E(X) 
E(Xf J_E(X ) 



Let us denote the multiplicating matrix on the right hand side by D. We note that D is block lower 
triangular and that due to the properties of the Kronecker product, p{A®'^) = {p{A))^ < p{A) (here and 
throughout the paper, p denotes the spectral radius of a matrix). From these it is clear that p{D) = p{A) < 
1. It is well-known that then there exists an induced matrix norm ||-||^ for which p(A) < ||A||^ < 1. This, 
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and the equivalence of vector norms suffice for the proof. □ 

Remark. The finiteness of the respective moments of the stationary distribution can be derived using the 
same approach as in the proof of formulae (2.2.3), (2.2.4) and (2.2.10) in Barczy et al. (2011). We note 
that the stationary distribution has exactly as many finite moments as the innovation distribution and the 
initial distributions have in common, because the Bernoulli distribution is bounded and therefore all of its 
moments are finite. 

A. 3 The veiriance of peirtial sums of the process 

The following lemma will be used for estimations in the next section and shows that while the values of the 
process are not independent, their dependence is very weak in the sense that the variance of the partial sums 
up to time n is only linearly increasing with n. 

Lemma A. 2. In a stable INAR(p) model under Hq 

(i) Var(Xi +X2 + ...+Xn) = EI^.-i Cov(Xi, X,) = 0(n), 

(ii) Var(XiXi_,+X2X2_5 + . . .+X„X„_,) = Y^lj^i Cov(X,X,_„ X,Xj_,) = 0(n) for all ^ q ^ p-1. 

Proof. Although the lemma is stated for the process (X„)„£n, calculations will require that we investigate 
the process [Xn)net^- Therefore, we will prove the following statements: 

||Var(Xi + X2 + . . . + X„)|| = 0(n) (A.IO) 

in the place of (i) and 

||Var(Xf + Xf + . . . + X®')|| = 0(n) (A.U) 

in the place of (ii). 

First we will prove (A.IO). (A. 6) implies that (||Var(Xi)||).gj^ is a (convergent and hence) bounded 
series. Let us denote its upper bound by Ui. This observation means that we will only need to investigate 
the sum o=i Cov(Xi, Xj) — C'Ov(Xj, Xj), since the latter sum is clearly 0(ti). 

First we will use (A. 3). It is immediate from (A. 3) that 

Xfc - E(Xfc) = A{Xk-i - E(Xfc_i)) + Mfcei. 
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Let us now fix 1 < j and write 

Cov{Xi,Xj) = E [{Xi - E{Xi)){Xj - E{Xj)y] 

= E [(X, - E(XO) E(X,- - E{Xj)\Tj-^V] (A.12) 
= E [{Xi - E(X,))(A(X,_i - E(X,_i)))^] = Cov{Xi, 

If we perform the calculations for the case 1 ^ j < i as well, we can see that, after \j — i\ iterations, 

Cov(X,, Xj) = A^'-^^+ Var(X^i„(,,,.))(A^)(^-')+. (A.13) 

Because p{A) < 1, there exists a matrix norm ||-||^ for which p{A) < \\A\\^ =: tt < 1. With this norm, 
(A.13), and the boundedness of (Var(Xj))j£N we can establish 

||Cov(Xi,X,)IL=0(7rl^-^-|), 

which yields (A. 10) immediately. 

For (A. 11) our reasoning will be very similar, although with more tedious calculations. First we note 
that (A. 6) implies boundedness for || Var(Xf^) || also. From (A. 4) we have 

E(Xf - E(Xf )|.Ffe_i) = A^'iXf^, - E(Xt\)) + «J (X,_i - E(X,_i))ef 

+ /x[A(Xfe_i-E(Xfc_i))] (g)ei +/xei (g) [A{Xk-i-E{Xk-i))] 
= A^\Xf^, - E(Xf J) + ef (Xfe_i - E{Xk-i)) 
+ /i[A(g) ei](Xfe_i-E(Xfc_i)) (g) +/x(ei (g) ^)(Xfe_i-E(Xfe_i)), 

analogously to (A. 9). Now, similarly to (A.12) we get, for 1 ^ i < j, 

Cov(Xf ,Xf ) = Coy{Xf,Xf\){A^y+Cov{Xf,Xj_,)a2{efy 

+ n Cov(Xf 2, X,_i) {A ® ei)^ + n Cov(Xf 2, X,_i) (ei Af . 

Here 

Cov(Xf ,X,_i) := E[(Xf - E(Xf ))(X,_i - E(X,_i))^], 
a, X p matrix. Also similarly to (A.12) we have 

Cov(Xf ,X,) = Cov(Xf ,X,_i)A^. 
17 



Summarizing, we get the following regression: 



Coy{Xf,Xjy 
Cov(Xf ,Xf )T 



ef ^aj + O ei) + /i(ei O A) A 



CoY{Xf,Xj_,y 
Cov(Xf2,Xf 



(A.14) 



Note that the multiplicating matrix on the right hand side is just D from the proof of Lemma A.l. Now, 
similarly to (A. 13), we have 



Cov(Xf ,X,)^ 
Cov(Xf2,Xf ) 



£)0-»)-t 



Cov(X^f„(..),X^i„(,,,.))-' 



(A.15) 



Now we only need to note that (Cov(Xf , Xi))^^^ is a bounded sequence (due to (A. 6)), and we can finish 



the proof of (A. 11) in the same way as (A. 10). 



□ 



A. 4 Strong consistency of the estimates 

In this subsection we want to show with our notations the fact, proven already by Du and Li (1991) for 9, 
that the estimates for the coefficients and the mean and variance of the innovation given in Section 2 are 
strongly consistent, i.e., 

(A.16) 



d^^^^e and a'^a^. 



First of all, it is a matter of simple calculations (see C.2) and a straightforward application of (A. 5) that, 
provided that the second moment of £i exists, 



1 " ~ 
-^Ml^a2E{X) + a^ 



(A.17) 



fc=i 



with CX2 given in (2.6). Hence, taking the limits of the expectations in (A. 3) and (A. 4) we have 



-®2 



'(8)2 



E(X) = AE(X)+/i and E(X ) = A*^^E(X ) + (/x^+a^ E(X)+CT^)+iu(eiOAE(X)+AE(X)(g)ei). 
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Prom this we can deduce, using (A. 6) and an argument from the proof of Lemma B.l, 
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d = e. 



A similar result can be derived for the estimate of cr^. By recalling (A. 17) and computing the strong limit 
of the other summands in (2.4), we obtain the strong consistency of immediately. The same reasoning 
shows that if the second moment of the stationary distribution is finite (in this case we already know that 
6 is a consistent estimator), then the limits of the estimators ct^ and are the same almost surely; hence, 
the strong consistency of is established. 

A.5 Proof of Theorem 3.1 

We recall that the theorem was stated under the null hypothesis, so we will assume it to hold. Let 



^ L»tJ 

^„(t):=^^Zft, te[0,l], Zk — 



Mk 



1 



The proof will be based on the following theorem. 
Theorem A. 3. Under the assumptions of Theorem 3.1 

Zr, A W, n^oo, 

where ('W(t))o^t^i is a {p+ 1)- dimensional standard Wiener process and 

J := E 













X 




X 


{alX + a^) 












1 




1 




V 





fc = l,2. 



(A.18) 



Proof. By (A.5) we have n /, and since g"^ and On arc strongly consistent estimators, therefore 

we have n~^In — ^ I as well. We will use the martingale central limit theorem for the martingale differences 
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-^Zk, n e N, k = 1, 2, . . . , n. To compute the variance function, we write 



L ^ fc=i 



fe=i 



1 



-X'fc-i 

1 



1 



1 















X 




X 




1 




1 




V 











It remains to check the so-called conditional Lindeberg condition: 



\nt\ 

EE 

fe=i 



-^Zk 



X{||„-i/2z,||>5} 







4 


( 


—i=Zk 











L"tJ 



fe=i 

L"tJ 



fe=i 



where P is a polynomial of degree six, because E(M^|J^fe_i) is a second-degree polynomial of (this 
is is detailed in C.2). The sixth moment of the stationary distribution is finite due to the assumptions, hence 
(A. 5) implies 

— ^P(Xfe_i)^E(P(X))<oo. 

\nt\ 



This means 



(52 



fe=i 



implying Lindeberg's condition. All the conditions of the martingale central limit theorem have been checked; 
the proof is therefore complete. □ 

Having proved Theorem A. 3, we now turn to the proof of Theorem 3.1. For this, let us introduce the 
notation 



'k 



zi" ■= m(") 



1 



fc = l,2,. 
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First we note that 



[ntj [nt\ [ntj \_nt\ [rUj 



/s=l 



fc=l fe=l 



fe=l fe=l 



1 



Recalling the definitions of and 
_ ( 

V 

We have 



1 



Xk — 



-X'fc-1 

1 



- On) 



1 



- = 



( n 

Y.XU 



k=l 



E^^ 
fe=i 



fe=i 



-X"fc-i 

1 
1 
1 



E 

k=l 



Xk-1 

1 



1 



hence 



— 1 \ k — 1 k — 1 / 



In the next step we notice that according to (A. 5) we have 



Q[nt\ Qn 



1 lnt\ ( 1 



-QVntl ] ( -Qn 



where -Ep+i is the p + l-dimensional identity matrix and 
Now we apply (A. 20), Theorem A. 3, and 



tQuoQHo=tEp+i Vie [0,1], 



/ 












X 




X 


V 


1 




1 










J 



(A.19) 



(A.20) 
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to conclude that 




te[o,i] 




mt)-tW{l))te[o,i]- □ 



B The process under the alternative hypothesis 

While in Theorem 3.1 we were able to consider longer and longer samples taken from the same process, this 
approach has to be modified for the alternative hypothesis. More precisely, we have to consider a scries of 
timc-inhomogcncous INAR(]3) processes, where the n-th one has a point of change at [npj (we will suppress 
this in the notation for simplicity). Now, the parts of these processes before the change (i.e., {Xi)\2']^) can 
be handled as a sample taken from an infinite INAR(p) process (at least in distribution), but this is not true 
for the second part (i.e., (-^i)i^[npj+i), because the initial distribution of this process depends on n. 
Therefore, for a rigorous analysis we need to refine the results of A. 2. 

B.l Ergodicity under the alternative hypothesis 

We have 




(B.l) 



where 5 : Nq — >■ R with E(|5((X)|) < 00. Indeed, for an arbitrary e > 




( 



n — [npj 



1 



fc= [np\ +1 



^ g{Xk) - E{g{x")) > e = x P(Xl„,j = x) 



and 




by the ergodic theorem for each a; e Nq, additionally 



P(Xl„pJ = x) < I P(Xl„pJ = x) - P(X = x)\ + P{X = x), 



and one can use (A. 7). We will also apply that for all £ > there exists 1^ such that 



II E{X ynp\+k) — E(X )|| < £ for aU n ^ 1^ and all k ^ v. 



(B.2) 
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First observe that there exists tt" e (0, 1) such that \\E{X i^pl+k) )ll < K')*"!! E(Xl„pj) - E(X )|| 
for all fc e N, see (A. 8). Next, for all ?7 > 0, choose vi and 1^2 such that (tt")'^ < 77 for all v\ and 
II E(Xl„pJ ) - E(x')|| < 77 for all n > z^2- Hence 

II E(XL„,j+fe) - E(X")|| ^ r/dl E(Xl„pJ ) - E(X )|| + || E(X ) - E(x")||) < tj^ + ,,|| E(X ) - E(x")||. 



B.2 Behaviour of the estimates under the alternative hypothesis 

In this subsection we will investigate the weak limit of the CLS estimates under the conditions of Theorem 
4.1. We will also show how the quantity V arises naturally if we would like to know how much the 
misestimation of the parameters influences our test process. 

Lemma B.l. Under the assumptions of Theorem 4-1 we have 



On > '■ — 



Proof. By (A. 5) and (B.l) we obtain 











a 






a 

















(1 - p)0 




1 _ 1 1 

~Qn — ~Q\np\ y, 



x-k-l 




Xk-1 


1 




1 



pC + (1 - p)C" = Q, 



k=\np\+l 

as n — >■ 00. Moreover, we can notice that for a homogeneous model the process 



Xk 
Xk-1 



satisfies a similar recursion to (A.l). The equivalent of the matrix A can then be shown to have a spectral 
radius smaller than 1, and by the same reasoning as in A.l we can conclude that {Ukjkef^ is crgodic, and 
we can apply (A. 5). Moreover, it is clear that if U denotes a vector with the unique stationary distribution 
of (f/ft)fegN then 

^ [7(2) 

c7(3) 



[/(p+i) 



^x 
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and for the components of U we also have 



f/(2) [/(p + l) 



V 



where = « = 1, . . . ,p, j G N and e = £i such that ah these variables are totally independent and 

also independent of (f/*-^-', . . . , U^P'^^^)'^ . Hence (A. 5) implies. 



-, L"pJ 

n. 



fe=l 



1 



pE 



x'_. 



x' 


) 


1 





pE (aiX'o + ■■■ + apX'_p_^_^ + p! 



x 




= pC' 


a. 








1 









as n ^ 00 (here e' = ei). In a similar way, using (B.l) 



1 " 

- E 



Xk-1 

1 



(l-p)C" 



For the rate of convergence we have the following result: 
Lemma B.2. Under the conditions of Theorem 4-1 we have 

§„-0 = Op(n-i/2). 

Proof. The difference can be decomposed in the following way: 

nV2(0„_0) = („-iQj-i„-i/2 



L"pJ 

E^^ 

fc=i 



Xk-i 

1 



-QnQ 



pC 



+ E 

fe= Ln^)J+l 



Xk-i 




a" 






- QnQ ' 1^(1 - P)C" 




)1 


1 




p" 





□ 



(B.3) 



The first factor converges to Q stochastically, and will therefore be omitted from further calculations. 
The second factor has been split in two and only the first part will be analyzed in detail. The analysis of 
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the second part is completely analogous. We split the first part in the second factor in the following way: 



-1/2 



[?ipj 

fe=i 



-1/2 







a' 










)1 


1 









/ L«pJ 

^fe=i 



1 



The first term is 



which is asymptotically normal, and therefore Op(l) according to Lemma A. 3 (the same reasoning applies 
after the change, since Lindeberg's theorem is valid for triangular arrays as well). We will decompose 
(Qinpi - pQnQ ^C*') in tlie following way: 

- n-y'{p[Q„ - E{QJ]Q~'C'} - n-'/'{p[E{QJ - nQ]Q~'c'} 
-n-^/^{pnC' - \np\C'}. 



The last term is deterministic and o(l). We know from (A. 11) that the variances of the first and third terms 
are bounded. Denoting the common upper bound by K we have, from Markov's inequality, for all n. 



P n 



Q\_np\ - E(Q|„p|) 



\np\. 



2 \ K 

> a] < ^Oasa— >-oo, 

a 



and similarly for the third term. Consequently, the first and third terms are Op(l). Recalling Lemma A.l 
we have 



E(Q,„pj)-MC" 



[np] 

E 

fc=i 



[npj 

<E 

fe=i 



E 



E 



1 

Xk-i 

1 



Xk-1 

1 

Xk-1 

1 



-C 



c 



J 



\np\ 

<^7r'= = 0(l), 

/c=l 



because the matrices within the sum consist entirely of the entries of — X .A similar calculation is 
valid for the fourth term. This implies the boundedness of the second and fourth terms. □ 
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B.3 Proof of Theorem 4.1 

Now wc turn to the proof of our main results: Theorem 4.1 and Theorem 4.2. We will use the following 
notations: 



Ml := Xk - a^Xfc_i - /x', Z'^ := 



1 



and similarly for M'^ and Z^'. The proof will be given for the process before \np\ in detail. The analysis 
of the process after \np\ can be handled analogously. In the proof we will rely repeatedly on ideas from 
Huskova et al. (2007). 

The first step is the following decomposition for k < [np\ : 



m(") = M',+ [(a - 5)^ E(X,_i) + (/x' - H) 

+ (a-S„)^(Xfe_i -E(Xfe_i)) 

+ [{a - E(Xfc_i) + 01- pr^)] ■ 



(B.4) 



For A: ^ [npj , the quantities and /i' have to be replaced with M^' and /i", respectively. Based on 

(B.4) we have: 



max f V - 



^ max 



max 

l<k<n 



\np\/\k [npjvfc 

i=l i=[rapj+l 
[npj Afe 

£ [(a-5)^E(X,_i) + (M'-M)] 



i=i 



+ max 



[ripj Vfe 

Y [(a-S)^E(Xi_i) + (M"-M)-nV'] 

i=[npj+l 

5](a-a„)^(X,_i-E(Xi_i)) 



max 



i=l 



^ [{a. - a„)^ E(Xfc_i) + (/X - At„)] 



i=l 



(B.5) 



The second term in (B.5) is deterministic. For its more detailed analysis we need the following lemma. 
Lemma B.3. Under the assumptions of Theorem 4-1 and using the notations from there we have 



{6' - ey 



E(X) 

1 



^ > 0, id" - ey 



E(X ) 

1 



l-p 



< 
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with 



Proof. We consider 



We can write 



e' := 



e" 



a 



a — cx 



= {l-p)Q C" 



E(X) 
1 



= C' 



whence the first equality in the statement is immediate. The second equaUty can be proven by the same 
reasoning. Now we need to show that V > 0. Indeed, 

I = (1 - - M"X+iC'Q"'c"ep+i, 



and C', C" and Q are positive definite matrices. The first two arc covariancc matrices and because 
at least one of the offspring and innovation distributions is nondegenerate, they will be positive definite 
(see C.l). The matrix Q is then the inverse of a convex combination of positive definite matrices, hence 
positive definite itself. □ 

Now the second term in (B.5) can be rewritten as 



max 

l<k<n 



[npj Ak 

{e'-ey 











x' 
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-E 








1 




1 



+ 



[npJ Vfe 
i=[np\-\-l 



^ max 



[npJ Ak 



E 



E 







— // 






X 


-E 




1 




1 






x' 


-E 




1 




1 



1-p 



— ntp 



max 

l<k<n 



max 



[npJ Vfe 

i0"-O)^ E 

f{[np\Ak) 







-E 


— // 
X 


)1 






1 




1 







n ) 



Because of Lemma A.l, the first two maxima are bounded, and the third one is obviously attained at fc = [npJ 
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with value ^-'^^ — nj tpi which is also bounded. Therefore the second term in (B.5) is 0(1) as n — >■ oo. 
We can use Lemma B.2 to show that the fourth term in (B.5) is Op{^/n), but for the first and third terms 
we also need another result. This can be found as Theorem 3.1. in Kokoszka and Leipus (1998). 

Lemma B. 4. Let {Yn)nen be a sequence of random variables with finite second moments, and let {cn)nen 
be a sequence of nonnegative constants. Then, for any a> 0, 



a P max Ck 



n-l 



n-l 



1/2 



+ 2^4+JE(y,\i) ^E(y,r,) 



fc=i 

n-l 



fc=o 



Lemma B.4 can be applied to the fourth term in the following way to show that it is Op(n^ '•'): 
Lemma B.5. For a time-homogeneous INAR(p) process satisfying condition Co and any -y < j we have 



max k'^ ^ 



^{Xk-i-E{Xk-i)) 



i=l 



Op(l). 



Proof. Wc will follow the proof of Lemma 4.2 in Huskova ct al. (2007) and apply Lemma B.4 with Ck = k^~^ 
and Yi^q = — E(Xj_i_q) for ^ g ^ p — 1 to show that the result holds for each component of 

the vectors. This implies convergence of the 1-norm, and because of the equivalence of vector norms, it is 
sufficient for the proof of the statement. We have 



{k + 1)2-27 A;2-27 



A;3-27 



and 



k k k—l—qk—l—q 

J2J2m,,Yj,q) = Yl E Coy{Xi,Xj)^Kk 

i=l j=l i=-q j=-q 
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for some constant k according to Lemma A. 2. Therefore, 



n-l 

E 

fe=i 



(fc + 1)2-27 fc2-27 



fe k 



n-l 



1/2 



i=l j=l k=l \i,j=l 



+ 2^fc2^-E(n%J 
fe=o 

n-l 



n-l 



n-l 



fe=l 



fe=l 



fe=0 



where Ui is the upper boundary of (Var(X„))„gN. The limit of the right hand side as n — > oo is 
finite, which completes the proof. We note the necessarity of 7 < j — otherwise the second term in the last 
expression would not be bounded. This indicates that if we would like to extend Theorem 4.1 to 7 = 5 we 
need sharper estimates in place of Lemma B.4. □ 

For the estimation of the first term we will use Lemma B.4 again: 

Lemma B.6. For a time-homogeneous INAR(p) process satisfying condition Cq and any 7 e (0, \) we have 



max k'' ^ 

l<k<n 



= Op(l). 



Proof. We apply B.4 in the same way as in the proof of Lemma B.5 with Cfc = k'^~^ and Yi = Mi. We note 
that the are martingale differences, therefore any product MiMj,i ^ j has zero mean. Furthermore, the 
sequence {YaT Mk)ke'ei is clearly bounded, and denoting its upper bound by U, we have 



n-l 

E 

fe=i 



1 



1 



{k + 1)2-27 fc2-27 



k k n— 1 / k 

i—1 j — 1 k—1 



1/2 



n-l 



+2^fc^--^E(n%) 

k=0 

n—1 n — 1 

< U{2 - 27) ^ fc27-2 + 2UY fc""'""^' + 2C/ E 

k=l k=l k=0 

whence the final steps of the proof are the same as in Lemma B.5. 



□ 



Summarizing our results for the terms of (B.5): the second term is 0(1), the fourth one is 

Op(ni/2) and 

the first and third terms are Op(n^~^), which completes our proof. □ 
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B.4 Proof of Theorem 4.2 



The statement can be written in the form 



which is equivalent to 



Hm supP(|f„- \np\ \ ^ K) = 0, 



hm HmsupP(|r„ — [np\ \ ^ K) = 0. 



Hence to prove the statement it is enough to show that 

fc k 

lim limsupP ( max V m]"^ ^ max VMi"M=0, (B.6) 

K^<x n^oo \ [np\-K<k<[np\+K ^ l^k^[np\-K ^ ■> \ 



hm hmsupP max ViVfj"^ < max VMi"M=0. (B.7) 

K^oo n^oo \\np\-K<k<\np\+Kj^^ ■> \np\+K^k<-^ ■' I 

For (B.6) we consider with a constant K, K < [np\ , the estimate 

(fc k 
max Vm^^^ max VM^^ 

[npl-K<k<[npi+K ■' l^k^[np\-K ^ ■' 

y Mf ^ < max y ) = P min y Mf ^ ^ 

^ ^ l^k^lnpi-K f^^ J j yi^k^lnpi-K ^ 

( [np\ \ / Vnp\ \ 

= P min y Mf') ^ = P min Y M^") < . 

We will use (B.4) again, note that the dominant term is the second one, and estimate the probability 
with this in mind. For any K ^ i ^ [np\ the expression 

L"pJ 

J2 ^i"^ (B-8) 

j=[npi-e+i 

can be decomposed according to (B.4). Now, (B.8) can only be negative in two cases: cither the second term 
in the decomposition is less or equal to ^, or it is greater — in which case one of the other three terms has 
to be less than — ^ (for the definition of ip, see Theorem 4.1). 
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Now it is clear that, after applying (B.4) to (B.8) we have 



P I mm 



[np] 



[np] 



s^P mill V [(a-5)TE(Xfc_i) + (/z'-M)] ^ 



+ P I max 



+ P I max 

Ksiisilnp]-! 



P I max 

K^e^[np]-l 



j=[npi-e+l 

[np] 



J2 



t 

6 



j=[np]-e+l 
[np] 

e-' J2 (a-S„r(^fe-i-E(Xfc_i)) 

j=[np]-e+l 
[npl 

j=[npl-e+l 



t 

6 



As a consequence of Lemma B.3 and (A. 6) the first term can be shown to converge to zero for any K as 
n — >■ oo with the help of the following simple lemma: 

Lemma B.7. Let a„ ^ a > 0, n ^ oo and > for all i e N. Then 



min k ^ \^ at ^ a, n — >■ oo. 



Proof. First we note that for any e > and sufficiently large n, we have mini^fc^„ S^ILn-fe+i ^» < a+e. 
This can be seen by choosing fc = 1 for every n. Now we show mini^fc^„ Y^ll=n-k+i > a — s. Let 
^{e) be the threshold index so that for n > ^{e) we have |a„ — a| < |. Let us denote by K the sum 
Y.ii^iO'i- Clearly, 



mm 

l^k^n—vis) 



k ^ ai — a 



i=n—k+l 



S 

<2- 



Furthermore, for any n > k > n — i'{e) we have 



k-' T ai> n ^ ^ ai 



i=n-k+l 



Me) 



n — v{e) 
n 



i=y(£) 



For sufficiently large n the first factor is close to 1, and the second factor is closer to a than | for every 
n. This suffices for the proof. □ 



Because of (A. 6) and Lemma B.2, the fourth term also converges to zero for all K as n — >■ oo. Indeed, 
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(a - a„) 



and 



[npj 



max V E(Xfc_i) < max E(Xft) 



\np\ 



i=\np\-l+\ 

and due to (A. 6) the right hand side is bounded as n ^ oo. The same reasoning appUes to (ju — The 
convergence of the second and third terms is summarized in the following lemma. 

Lemma B.8. For a time-homogeneous INAR(pJ process satisfying condition Co we have for any a> 0, 



lim limsupP max 



[np\ 

^ - E(X,_i)) 

j=[npi-e+l 



> a = 



and 



lim lim sup P max 

K->-oo n^oo \K^e^[npi-l 



[npJ 

E 

j=[npi-e+l 



> a \ =0. 



Proof. Similarly to the proof of Lemma B.5 we will again employ Lemma B.4 with Cfe = {K + k — 1) ^ and 

^1,9 = T,\Z^inpi-K+i ^3-1 and Yi^q = Xynp^_K+i-i-q for i ^ 2 and < q p - 1. 
By an easy calculation 

k Vnp\ 

J2 mYj) = E - E(X,_i))(X,_i - E(X,_i))). 

j,i=l i,j=[np\—K—k+l 

Therefore, applying the same estimations and notations as in the proof of Lemma B.5 with 7 = 0, we obtain 
the following upper limit for the probability in question: 



LnpJ-l 



L«pj-i 



[npl-1 



e=K 



K 



l=K-l 



It is obvious that as n ^ 00 and then K ^ oo, the above expression converges to 0, which suffices for 
our proof. For the second statement the arguments are the same. We note that (M„)„gN is a martingale 

difference sequence, hence its elements are pairwise uncorrelated. Furthermore, Var(M„)„gN is bounded, 
which implies Var(Mi + . . . + M„) = 0(n) immediately. □ 

To prove (B.7) the proof is analogous wiht one exception: in place of Lemma B.7 we need the following 
result. 

Lemma B.9. Let a„ — > a > 0, n — > 00 and > for all i G N. Then 



lim inf k ^ \^ = a, n ^ 00. 



i=l 
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Proof. We only need to observe that convergence of a„ implies convergence in Cesaro mean as well, therefore, 



for a sufficiently large K and for all K the average k ^ Y^i^i cn is close to a. 



□ 



B.5 Adapting the proofs of Theorem 4.1 and Theorem 4.2 to other forms of 

the alternative hypothesis 

The proofs of Theorem 4.1 and Theorem 4.2 heavily exploit the relatively simple structure of the test process 
for detecting change in /j, only. For the other parameters, even the limit in Theorem 4.1 will be different, 
namely, we have the following theorem. 

Theorem B.IO. Suppose that Ha holds with r = max([npj,l), p e (0,1), and the change is only in 
OLq, namely, changes from a'g to a'g, where a' > a" > 0. Suppose, furthermore, that condition Ca 
holds. Then for any 7 e (O, j) we have 



max M^^'Xj-g = ntpq + Op(n^ '•') as n ^ 00, 



with 



^,:=p(l-p)(a;-a;')e^C"Q C'e, > 0, 



where Sq is the q-th {p + 1-dimensional) unit vector. 

The proof is analogous to that of Theorem 4.1. All the techniques in the proofs can be directly adapted, 
noting that all means converge due to (A. 6), and Lemma B.2 will remain true under any form of the 
alternative hypothesis. In place of (B.4), we can then write 



+ (a' - anViXk-lXk-q - E{Xk-lXk-q)) + {fi' - Jl) {Xu-q - E{Xk-q)) 

+ [(5 - E{Xk-iXk-q)) + 01- iln) E(Xfe_g)] . 
As in (B.4), the second term will be dominant here and in place of Lemma B.3 we have 



{9' - ey 



E(X X'_q+,) 

m'-q+,) 



= ti>o, (6" - ey 



E{X X%^,) 



(B.9) 



We need (ii) in Lemma A. 2 to be able to apply Lemma B.4 to the first and third terms in (B.9). The rest 
of the adaptation is straightforward. 
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If there is a change in multiple parameters, the analogue of Theorem 4.1 can still be stated, but the 
sign of the dominant term depends nontrivially on the directions of the changes in these parameters (i.e., in 
Lemma B.3 we can determine the limit but we cannot determine its sign without calculating it explicitly). 
The one-sided test, however, relies on our knowledge of the sign of the dominant term. This warns us that 
we should use the one-sided tests only when testing for change in a single parameter. 



C Technical details 

C.l Invertibility of the matrices Q„, C' and C" 

In (2.2) we assumed that the matrix Q„ is invertible, and similarly, in B.2 we assumed that C' and C" are 
positive definite. The following two lemmas will show that these assumptions are correct. 

Lemma C.l. For a homogeneous INAR(p) process with /U > 0, for which either Uq G (0,1) for some 
q G {1,2, . . . ,p} or a > 0, we have 

P{Qn is singular) 0. 



Proof. Since 









1 




1 



is a sum of positive semidefinite matrices, it is positive semidefinite itself. Therefore, its singularity is 
equivalent to the condition that for some v G Rp~^^ and every index i e {1, . . . , n}, we have 





Xi-i 
















1 




1 



v = 0. 



which is equivalent to the condition that the linear span of 



Xi-i 
1 



,i = l,...,n 



is a proper subspace of MF~^^. Now, using the continuity of probability, our statement is equivalent to the 
following: 





/ 




Xi-i 




\ 


p 


span < 






,i€N 










1 




J 



= 0, 



(C.l) 
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where < denotes proper subspace. For simplicity, throughout the proof we will use the notation 



1 



It is clear that all values of (l^j)jeN fall into Nq x {!}. We introduce the following notation for the set of 
spaces that can be spanned by the values of the process: 



S:={S<W+^ ■.S = s^B.n{v^,V^,...,y^], y^,y2, ■ ■ ■ ,Vn x {1}, " e N}. 



(C.2) 



We can notice that <S is countable. Indeed, every generating system of a subspace contains a basis, therefore 
every subspace S G S has a basis whose elements are from Nq x {!}. Such a basis is from (Nq x {1})'^ where 
k = dim 5, and ^ k ^ p, and of course a basis corresponds to only one subspace. Now, since Ng x {1} is 
countable, (Ng x {l})*^ is also countable for any fc e N, therefore U^^g(Ng x {!})* is also countable, and so 
is S. 

Now we reformulate the event in (C.l): 

{span{Yi,iG N} < W+'^} = [j {span{Yi,i G N} = S} = [j {spa.n{Yi,i e N} = 5} 

s<M.+^ ses ^^3^ 

c IJ {span {ri,ie Mies'}. 

ses 

Since the last union is countable, we can apply u-subadditivity to show (C.l) if we can prove 

P (span {r„ z e N} C 5*) = lim P (span {1^1, 1^2,. -., Ionics') =0, V5 e 5. (C.4) 

n— fCJO 

Here the first equality is trivial by the continuity of probability; it is the second equality which requires a 
more detailed proof. The first step in the proof of (C.4) relics on the mechanism by which the components of 
Y j+i can be obtained from those of Yi . For a fixed S G S the elements of S can be viewed as the solutions 
of a homogeneous system of independent linear equations, i.e., y G S ii and only if 

P+i 

^ Aij 0, i = l,2,...,p+l-dimS'. (C.5) 
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This representation is not unique, but we can fix one such representation. Now let us introduce 

K{S) := mm{j G {1,2, ... ,p} : max \Xij\ > 0} and 
c(s) := mm{i e {1,2, . . . ,p + 1 - dim S} : \Xi,KiS) \ > 0}- 

This notation means that K{S) is the first column index for which a nonzero coefficient appears in some 
equation in (C.5) and the first nonzero coefficient in the c(s)-th equation has index K{S). We also note that 
K{S) = p + 1 is impossible because that would mean that the only equation is j/^^+i) = 0, which does not 
hold for any element of Nq x {1}. 

Let us now fix an arbitrary i G N and co G SI from our underlying probability space such that Yi{uj) = 
y = Then we have 

Yi+K(S){^) = (x,+K(s)-iM,...,Xi(a;),yW,...,y(f-^(^)),l)^ (see (A.l)). 

Hence, for Yi_^_K{s){'^) G S to hold, it is necessary (but usually not sufficient) that 'Kj+x(s)(w) satisfy the 
c(s)-th equation in (C.5), i.e., 

K{S) p 

\(s)j-'^i+if(s)-j('^) + X] Kis),jy'^^^^''^^^ + K(s),p+i = 

J=l j=K{S) + l 

p 

j=K{S) + l 

This linear equation has a unique solution for Xi{Lu) because Xc{s),k{s) 7^ 0. Let us denote this unique 
solution by m{y, S) (by simple algebraic considerations one can see that this quantity does not depend on 
the representation on (C.5), but this is not necessary to our proof). Therefore, if Xi{co) ^ m{y,S), then 
w ^ {Yi+K{S) & S,Yi=y}, hence 

{Yi+KiS) eS,Yi=y}C {Xi = m{y, S), Y^ = y} \/i G N, Vy e x {1}. 

If m{y, S) N, then we have {Yi+K{S) GS,Yi = y}= 0. 

Now we will consider the second event in (C.4) for n = n + K{S), and split it according to the initial 
value of the process: 

{span {Yi,Y2, . . . , Yn+K{S) }C5}= U {span {r 1,^2,..., Yn+K{S)} Q S,Yi = y-^}. (C.6) 

i/ieNgx{i} 
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The individual events in the union can be transformed in the following way: 



{span {Y,,Y2, Y„+KiS)} C S,Y, = y,} = {Y, G S,Y2 G S, . . . , Y„+k(S) eS,Y,=y,} 
= {Yi e 5, 1^2 e 5, . . . , Yx(s) G S,Yi = y^, Yi_^_x(s) € 5, ... , Y^+KiS) € 

= {Y,eS,Y2eS,..., Yk(s) &S,Y,= y„ Y^ = y^, Y^+xiS) 

■S,..., Yk{s) &S,Yi = yi, Y2 = y2, ^2 = m{y2, S), I's+kcs) e 



Y2 € S, . . . , Yj^j^s) e S,Yi — y^, Xi — m{yi, S),Y2+K{s) e S*, . . . , Yn+K(S) 

G S,..., Yn+K{S) e S} 

€S} 



C{Yi€S,Y2€ 

= {Yi€ S,Y2€ S,..., Yk{s) €S,Yi = yi, Y2 = y2, Y3 = y^, Y^+ki^s) eS,..., 



S,. . ■,Yn_^_x{s) 
eS} 



C{YieS,Y2GS,...,YKis)^S,Yi = y„Y2 = y2,Y3 = ys,.-.,Yn = yn}, 



where the sequence (yj)"=i is defined by the recursion 



Vi = 



"^(y^-l,5') 

i/j-l 



Vi-i 



(C.7) 



i = 2, 3, . . . , n. 



(C.8) 



We would like to represent the probability of the last event in (C.7) as a product of transition probabilities. 
For this we first need to determine whether the event is empty, and now we will give to necessary conditions 
on yi for its nonemptiness. The first condition is, clearly, that all elements of the sequence defined in (C.8) 
fall into Nq X {!}. We will not investigate this condition in any further detail, we only note that this imposes 
a deterministic condition on y^. Another deterministic condition is that Yi G S,Y2 € S,. . . ,Yx(s) € S 
should all hold. Because the first K{S) — 1 coefficients are all zero in any equation in (C.5) and Yi contains 
all the components indexed K{S) or greater in Yi, . . . , Y'/f(5), the validity of these inclusions is determined 
by yi alone. This imposes the second (again, deterministic) condition on y^. If we denote the set of y^ 
which fulfill both these conditions by C/„, we have from (C.6) and (C.7), 



{span{Yi,Y2,...,Yn+K{S)} ^ S} C (J {Yi = y-^,Y2 = y2,Y3 = ya, ■ ■ ■ ,Yn = yj, 
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hence by cr-subadditivity (f/„ is clearly countable), 

P(span{Yi,r2,...,V„+K(s)} C5) ^ ^ P {Yi = y^Y^ = y^, ■ ■ ■ ,Yn = y^) 

(C.9) 

= Y = yi)Pyi,y2Py2,y3 ' • •^'i/n-i.i/n' 

where denotes the transition probability of Y from u to v. Because the sets {Un)neN form a nonin- 
creasing sequence (the second condition does not depend on n, and the first one become more restrictive as 
n increases), it is sufficient to show that for any sequence (yJieN € (Ng x {l})'^ we have 

l}^Py,,y^Py^,y3 • • ■ Py„_„y^ = 0, (C.IO) 

and from this we will get (C.4). For the proof of (C.IO) we will need to establish upper bounds for the 
transition probabilities. We will first consider the case when a > 0, i.e., when the innovation distribution is 
nondegenerate. 

Let us fix M, e X {1} so that w^^) = ^(1)^ ^(3) = ^(2), . . . , = = 1. We would like to 

give a.n upper bound for Pu,v 

We have for every i G N and any m e No, 



m 



3=1 e=i J \ 3=1 e=i 



= p(ei= - rnj . 

Applying the law of total probability we get 

yy^3,i,i = m < maxP{ei = k) < 1, (C.U) 
j=i e=i ) 

since the innovation distribution was nondegenerate. Therefore, if cr > 0, then we have a uniform upper 
bound on the transition probabilities, which implies (C.IO) immediately. 

The other case is if the innovation distribution is degenerate. First we note that in this case the innovation 
is equal to its expectation /i > almost surely, so that all components of Y i are positive for i ^ p + 1. 
According to the conditions, there is a coefficient aq, q £ {1,2, ... ,p} such that < < 1. Similarly to the 
previous reasoning, if additionally we suppose that all components of u and v are greater or equal to fj,, we 
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have 



= P 



Y,+i = v\Yi = u,ii 
( 

V 



J2 J2 ^j'*'^ 



j=i 1=1 



/ 



^q,i,uM =v^'^'^ -m\Yi =u,iJ.- 



j=i e=i 



+ 



m 



E ^9,^/ = 



Here we note that j is a meaningful notation because u'-'-' ^ and is a positive integer. Applying 
the law of total probability again, we have that 

Pu,v < max(ag, 1 - a^) < 1, 

which again gives a uniform upper bound for the transition probabilities and yields (C.IO). With this our 
proof is complete. □ 

It may be worth noting that Lemma C.l imposes very weak conditions on the process — we only neglect 
the trivial case when all innovation and offspring distributions arc degenerate. Also, the lemma does not 
require that the process start from zero — the initial distribution can be arbitrarily chosen on U. This gives 
us a chance to prove two important corollaries. 

Corollary C.2. For an INAR(p) process under the alternative hypothesis satisfying the assumptions of 
Lemma C.l both before and after the change, and r = max(l, \ np\) for some p > constant, we have 

P(Q„zs singular) — 0. 

Proof. To show this statement we only need to note that due to Lemma C.l we have 

P(Q|np|is singular) 0, 



and clearly 



{Q„is singular} C {Qi„„iis singular} 



due to the reasoning at the beginning of the proof of Lemma C.l. 



□ 
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Remark. The conditions of this corollary could be greatly relaxed. Actually, it can be shown to be true for 
any change point sequence r„, but a proper formulation is tedious and not a goal of this paper. 

Corollary C.3. Under the conditions of Theorem 4-1 both C' and C" are positive definite. 

Proof. First we prove for C' . We note that Lemma C.l did not impose any conditions on the initial 
distribution of the process Y, therefore we can start the process from its stationary distribution (the existence 
of which is a trivial corollary of the existence of such a distribution for X before the change). Now, the 
singularity of C' is equivalent to the condition 





/ 








X' 




p 










v 


1 





Let us now suppose that the stationary distribution of Y is concentrated on a proper subspace S < W^^ . 
From (C.4), however, we conclude that the probability of the process remaining in S forever is zero. As 
the distribution of Y n is the stationary distribution for every time n, this is an immediate contradiction. 
Therefore C' is nonsingular, but since it is a covariance matrix, it is positive semidefinite, therefore it has 
to be positive definite. The proof is the same for C" . □ 

C.2 The conditional moments of 

We shall now derive several moments of Mk conditionally on Tk-i (this calculation is a reproduction of that 
in T. Szabo (2011a)). Let us write Mj, in the form 

Mk= ^ (6,fc,j - ai) + ^ (6,fc,j - as) + ... + ^ (Cp,fc,j - ap) + (e^ - /x). 

All the terms on the right hand side have zero mean and are independent of each other conditionally on 
J^k-i, therefore 

E(M||J-fe_i) = ai(l - ai)Xk-i + ... + ap{l - ap)Xk-p + a^. 
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Similarly, 

p p 
^{M^\Tk-i) = ^E(te,i,i - aif)Xk-i + 3 ^ E((^i,i,i - ai)'(0,i,i - 

i=l i,j=l,i^j 

+ 6 (^*) E2(te,i,i - aO') + 6 Xk-i E((^,,i,i - - m)') + E((ei - nf) 

i=l ^ ^ i=l 

P 

= aJXfe_i + 3 ^ aj(l - aj)aj(l - Q!j)XjXj 
i=i ^ ^ 
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